Search CORE

170 research outputs found

Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Author: AG Barto
C Watkins
D Silver
LJ Lin
R Bellman
RJ Williams
VR Konda
WR Thompson
Publication venue
Publication date: 12/06/2019
Field of study

Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete actions, with an actor and several off-policy critics. Off-policy critics are compatible with experience replay, ensuring high sample-efficiency, without the need for off-policy corrections. The actor, by slowly imitating the average greedy policy of the critics, leads to high-quality and state-specific exploration, which we compare to Thompson sampling. Because the actor and critics are fully decoupled, BDPI is remarkably stable, and unusually robust to its hyper-parameters. BDPI is significantly more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete, continuous and pixel-based tasks. Source code: https://github.com/vub-ai-lab/bdpi.Comment: Accepted at the European Conference on Machine Learning 2019 (ECML

arXiv.org e-Print Archive

VU Research Portal

Crossref

A framework for reinforcement learning with autocorrelated actions

Author: AG Barto
H van Hoof
P Wawrzyński
P Wawrzyński
Publication venue
Publication date: 10/09/2020
Field of study

The subject of this paper is reinforcement learning. Policies are considered here that produce actions based on states and random elements autocorrelated in subsequent time instants. Consequently, an agent learns from experiments that are distributed over time and potentially give better clues to policy improvement. Also, physical implementation of such policies, e.g. in robotics, is less problematic, as it avoids making robots shake. This is in opposition to most RL algorithms which add white noise to control causing unwanted shaking of the robots. An algorithm is introduced here that approximately optimizes the aforementioned policy. Its efficiency is verified for four simulated learning control problems (Ant, HalfCheetah, Hopper, and Walker2D) against three other methods (PPO, SAC, ACER). The algorithm outperforms others in three of these problems.Comment: The 27th International Conference on Neural Information Processing (ICONIP2020

arXiv.org e-Print Archive

Crossref

The Emergence of Norms via Contextual Agreements in Open Societies

Author: AG Barto
D Fudenberg
DJ Watts
J Epstein
JR Kok
M Bowling
ML Puterman
O Sen
R Albert
Y Shoham
Publication venue
Publication date: 24/03/2015
Field of study

This paper explores the emergence of norms in agents' societies when agents play multiple -even incompatible- roles in their social contexts simultaneously, and have limited interaction ranges. Specifically, this article proposes two reinforcement learning methods for agents to compute agreements on strategies for using common resources to perform joint tasks. The computation of norms by considering agents' playing multiple roles in their social contexts has not been studied before. To make the problem even more realistic for open societies, we do not assume that agents share knowledge on their common resources. So, they have to compute semantic agreements towards performing their joint actions. %The paper reports on an empirical study of whether and how efficiently societies of agents converge to norms, exploring the proposed social learning processes w.r.t. different society sizes, and the ways agents are connected. The results reported are very encouraging, regarding the speed of the learning process as well as the convergence rate, even in quite complex settings

arXiv.org e-Print Archive

Crossref

Deep Reinforcement Learning: An Overview

Author: AG Barto
D Ormoneit
F Sehnke
G Tesauro
H-G Beyer
J Kober
J Schmidhuber
LP Kaelbling
MG Bellemare
P Vincent
RS Sutton
S Hochreiter
SS Mousavi
V Mnih
W Böhmer
Y Bengio
Y Bengio
Y Bengio
Y Lecun
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/06/2018
Field of study

In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

arXiv.org e-Print Archive

Crossref

Learning flexible sensori-motor mappings in a complex network

Author: A Bethge
A Soltani
A Soltani
AG Barto
AG Barto
D Amit
DJ Willshaw
DR Chialvo
ED Sontag
Eleni Vasilaki
GE Hinton
IR Fiete
J Werfel
JJ Hopfield
K Narendra
P Bak
P Dayan
P Dayan
P Mazzoni
RJ Williams
RS Sutton
S Fusi
S Fusi
SE Fahlman
Stefano Fusi
W Senn
W Senn
Walter Senn
WF Asaad
X-J Wang
X-J Wang
Xiao-Jing Wang
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Given the complex structure of the brain, how can synaptic plasticity explain the learning and forgetting of associations when these are continuously changing? We address this question by studying different reinforcement learning rules in a multilayer network in order to reproduce monkey behavior in a visuomotor association task. Our model can only reproduce the learning performance of the monkey if the synaptic modifications depend on the pre- and postsynaptic activity, and if the intrinsic level of stochasticity is low. This favored learning rule is based on reward modulated Hebbian synaptic plasticity and shows the interesting feature that the learning performance does not substantially degrade when adding layers to the network, even for a complex problem

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Bern Open Repository and Information System (BORIS)

Learning Shapes Spontaneous Activity Itinerating over Memorized States

Author: A Luczak
A Luczak
AG Barto
C Gros
DE Rumelhart
DJ Willshaw
H Jaeger
I Tsuda
JJ Hopfield
JL Elman
JNJ Reynolds
Kunihiko Kaneko
M Rabinovich
MD Fox
MI Jordan
O Mazor
Olaf Sporns
RabinovichMI
S Fusi
T Kenet
T Kohonen
T Kurikawa
T Sasaki
TM Jay
Tomoki Kurikawa
W Maass
X Xie
Publication venue: Public Library of Science
Publication date
Field of study

Learning is a process that helps create neural dynamical systems so that an appropriate output pattern is generated for a given input. Often, such a memory is considered to be included in one of the attractors in neural dynamical systems, depending on the initial neural state specified by an input. Neither neural activities observed in the absence of inputs nor changes caused in the neural activity when an input is provided were studied extensively in the past. However, recent experimental studies have reported existence of structured spontaneous neural activity and its changes when an input is provided. With this background, we propose that memory recall occurs when the spontaneous neural activity changes to an appropriate output activity upon the application of an input, and this phenomenon is known as bifurcation in the dynamical systems theory. We introduce a reinforcement-learning-based layered neural network model with two synaptic time scales; in this network, I/O relations are successively memorized when the difference between the time scales is appropriate. After the learning process is complete, the neural dynamics are shaped so that it changes appropriately with each input. As the number of memorized patterns is increased, the generated spontaneous neural activity after learning shows itineration over the previously learned output patterns. This theoretical finding also shows remarkable agreement with recent experimental reports, where spontaneous neural activity in the visual cortex without stimuli itinerate over evoked patterns by previously applied signals. Our results suggest that itinerant spontaneous activity can be a natural outcome of successive learning of several patterns, and it facilitates bifurcation of the network when an input is provided

Crossref

Directory of Open Access Journals

PubMed Central

Particle Swarm Optimization with Reinforcement Learning for the Prediction of CpG Islands in the Human Genome

Author: AG Barto
C Jiang
CD Davis
Cheng-Hong Yang
D Takai
F Fang
H Lai
Hsiu-Chen Huang
HW Ressom
J Hancock
J Kennedy
JP Egan
L Han
L Ponger
Li-Yeh Chuang
LY Chuang
M Gardiner-Garden
M Hackenberg
M Hackenberg
M Tykocinski
MF Kane
Ming-Cheng Lin
P Rice
R Illingworth
R Lister
R Poli
RK Hanson
S Whitehead
S Yegnasubramanian
VG Gudise
Vladimir Brusic
Y Lin
Y Sujuan
Publication venue: Public Library of Science
Publication date: 28/06/2011
Field of study

BACKGROUND: Regions with abundant GC nucleotides, a high CpG number, and a length greater than 200 bp in a genome are often referred to as CpG islands. These islands are usually located in the 5' end of genes. Recently, several algorithms for the prediction of CpG islands have been proposed. METHODOLOGY/PRINCIPAL FINDINGS: We propose here a new method called CPSORL to predict CpG islands, which consists of a complement particle swarm optimization algorithm combined with reinforcement learning to predict CpG islands more reliably. Several CpG island prediction tools equipped with the sliding window technique have been developed previously. However, the quality of the results seems to rely too much on the choices that are made for the window sizes, and thus these methods leave room for improvement. CONCLUSIONS/SIGNIFICANCE: Experimental results indicate that CPSORL provides results of a higher sensitivity and a higher correlation coefficient in all selected experimental contigs than the other methods it was compared to (CpGIS, CpGcluster, CpGProd and CpGPlot). A higher number of CpG islands were identified in chromosomes 21 and 22 of the human genome than with the other methods from the literature. CPSORL also achieved the highest coverage rate (3.4%). CPSORL is an application for identifying promoter and TSS regions associated with CpG islands in entire human genomic. When compared to CpGcluster, the islands predicted by CPSORL covered a larger region in the TSS (12.2%) and promoter (26.1%) region. If Alu sequences are considered, the islands predicted by CPSORL (Alu) covered a larger TSS (40.5%) and promoter (67.8%) region than CpGIS. Furthermore, CPSORL was used to verify that the average methylation density was 5.33% for CpG islands in the entire human genome

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Learning flexible sensori-motor mappings in a complex network

Author: A Bethge
A Soltani
A Soltani
AG Barto
AG Barto
D Amit
DJ Willshaw
DR Chialvo
ED Sontag
Eleni Vasilaki
GE Hinton
IR Fiete
J Werfel
JJ Hopfield
K Narendra
P Bak
P Dayan
P Dayan
P Mazzoni
RJ Williams
RS Sutton
S Fusi
S Fusi
SE Fahlman
Stefano Fusi
W Senn
W Senn
Walter Senn
WF Asaad
X-J Wang
X-J Wang
Xiao-Jing Wang
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Human–agent collaboration for disaster response

Author: A Chapman
A Monares
AG Barto
C Boutilier
C Boutilier
C Guestrin
C Guestrin
CE Rasmussen
Chris Greenhalgh
DS Bernstein
DV Pynadath
Feng Wu
G Convertino
GI Hawe
GJN Cooke
H Kitano
J Drury
J Searle
JM Bradshaw
Joel E. Fischer
L Kocsis
M Tambe
MA Khan
Mausam
Nicholas R. Jennings
NR Jennings
P Auer
P Scerri
R Chen
S Benford
S Proper
S Reece
Sarvapali D. Ramchurn
SD Ramchurn
SP Simonović
Stephen Roberts
Steve Reece
TL Lenox
Tom Rodden
Wenchao Jiang
ZO Toups
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/02/2015
Field of study

In the aftermath of major disasters, first responders are typically overwhelmed with large numbers of, spatially distributed, search and rescue tasks, each with their own requirements. Moreover, responders have to operate in highly uncertain and dynamic environments where new tasks may appear and hazards may be spreading across the disaster space. Hence, rescue missions may need to be re-planned as new information comes in, tasks are completed, or new hazards are discovered. Finding an optimal allocation of resources to complete all the tasks is a major computational challenge. In this paper, we use decision theoretic techniques to solve the task allocation problem posed by emergency response planning and then deploy our solution as part of an agent-based planning tool in real-world field trials. By so doing, we are able to study the interactional issues that arise when humans are guided by an agent. Specifically, we develop an algorithm, based on a multi-agent Markov decision process representation of the task allocation problem and show that it outperforms standard baseline solutions. We then integrate the algorithm into a planning agent that responds to requests for tasks from participants in a mixed-reality location-based game, called AtomicOrchid, that simulates disaster response settings in the real-world. We then run a number of trials of our planning agent and compare it against a purely human driven system. Our analysis of these trials show that human commanders adapt to the planning agent by taking on a more supervisory role and that, by providing humans with the flexibility of requesting plans from the agent, allows them to perform more tasks more efficiently than using purely human interactions to allocate tasks. We also discuss how such flexibility could lead to poor performance if left unchecked

Nottingham ePrints

Nottingham eTheses

Southampton (e-Prints Soton)

Crossref

Repository@Nottingham

Spiral - Imperial College Digital Repository

AI to enhance interactive simulation-based training in resuscitation medicine

Author: AG G EM R, Champion H
Bellomo R Goldsmith D, Uchino S
Brisk R
Confidential N
CW C Soar J, Aibiki M
FF A Santana N
Hogan H Healey F, Neale G, Thomson R, Vincent C, Black N
JP N Soar J, Smith G
Kaneva B Torralba A, Freeman W
Kolb D
Li W Fritz M
MG H Little L
Mnih V Badia A, Mirza P, Graves A, Lillicrap T, Harley P, Silver D, Kavukcuoglu K
Mnih V Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M
Mnih V Kavukcuoglu K, Silver D, Rusu A, Veness J, Bellemare M, Graves A, Riedmiller M, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D
National Institute for Health and Clinical Excellence
NE S AG G, SA R
Perkins G Kimani P, Bullock I
Perkins G Kimani P, Bullock I
Pishchulin L Jain A, Wojek C, Andriluka M, Thormaehlen T, Schiele B
RM S Niles D, Meaney P
Schneider M Rittle-Johnson B, Star J
Silver D
Sutton R Barto A
Thomson R Leuttel D, Healey F, Scobie S
Wang S Summers R
Young G
Publication venue: 'BCS Learning and Development Limited'
Publication date: 10/05/2018
Field of study

Crossref

Ulster University's Research Portal